-
Notifications
You must be signed in to change notification settings - Fork 230
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
mpi: Enhance flexibility for custom topologies #2134
Conversation
0a8ecf0
to
c8d3b54
Compare
Codecov Report
@@ Coverage Diff @@
## master #2134 +/- ##
==========================================
- Coverage 87.10% 87.07% -0.03%
==========================================
Files 223 223
Lines 39813 39832 +19
Branches 5166 5169 +3
==========================================
+ Hits 34679 34685 +6
- Misses 4556 4569 +13
Partials 578 578
|
devito/mpi/distributed.py
Outdated
# Decompose the processes remaining for allocation to prime factors | ||
alloc_procs = np.prod([i for i in items if i != '*']) | ||
remprocs = int(input_comm.size // alloc_procs) | ||
prime_factors = primefactors(remprocs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks a bit overly intricate.
If you use factorint(remproc)
you get directly the dict of factors that you just need to split i.e
factors = factorint(remprocs)
vals = [k for (k, v) in factors.items() for _ in range(v)]
vals = vals + [1 for _ in range(nstars - len(vals))]
startvals = (*vals[:nstart-1], prod(vals[nstars-1:])
and should be it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, not realy, since you do not prioritise neither the outermost dimension, nor the overall balance:
e.g. your approach will give:
FAILED tests/test_mpi.py::TestDistributor::test_custom_topology_3d_dummy[24-topology20-dist_topology20] - assert (2, 2, 6) == (6, 2, 2)
FAILED tests/test_mpi.py::TestDistributor::test_custom_topology_3d_dummy[32-topology21-dist_topology21] - assert (2, 2, 8) == (4, 4, 2)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's just a minor change to the last line to cycle instead of multiply the remainder
split = np.array_split(vals, nstar)
starvals = np.prod([np.pad(s, (0, nstar-len(s)), constant_values=1) for s in split], dims=1)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And pad (nstar-len(s), 0)
if you want it left heavy instead
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I pushed a simplified version, lemme know if you like it better.
I would dare to say that I consider your approach less readable, at least for my eyes.
btw, I could not make it work?
Would you mind to try locally your code?
15761fe
to
3ae6798
Compare
3ae6798
to
958579c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Several improvements possible in my opinion
devito/mpi/distributed.py
Outdated
for index, value in zip(int_pos, int_vals): | ||
processed[index] = value | ||
|
||
if dd_list: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do you need this if?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and it's probably doable with a comprehension
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Really? Like how without iterating O(len(processed))?
devito/mpi/distributed.py
Outdated
the outermost dimension | ||
|
||
Assuming N=6 and requested topology is `('*', '*', 1)`, | ||
since there is no integer k, so that k*k=6, we resort to the closest factors to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you rewrite this (grammarly maybe)? it's not very clear to me at first glance
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I simplified it to let the examples speak by themselves
devito/mpi/distributed.py
Outdated
# Start by using the max prime factor at the first starred position, | ||
# then cyclically-iteratively decompose as evenly as possible until | ||
# decomposing to the number of `remprocs` | ||
while remprocs != 1: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nitpicking: if one passes a dummy comm with .size = 0, this remains entrapped in an infinite loop. Safer to use : while remprocs > 1
Also, I see an inconsisten use of _
in variable names
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Switched to rem_procs
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think passing 0 is bad use, maybe we can jump to the assert statement directly ?
Asking cutom topo on 0 ranks looks bad to me
devito/mpi/distributed.py
Outdated
# Decompose the processes remaining for allocation to prime factors | ||
alloc_procs = np.prod([i for i in items if i != '*']) | ||
remprocs = int(input_comm.size // alloc_procs) | ||
prime_factors = primefactors(remprocs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
instead of having this both here and inside the while loop, you should have it only once, at the top of the while loop...
which, if necessary, you can turn into:
while True:
if rem_procs > 1:
break
...
thus emulating a do-while loop
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
probably same story with rem_procs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dropped and improved, lemme know if it is fine
devito/mpi/distributed.py
Outdated
try: | ||
assert np.prod(processed) == input_comm.size | ||
except: | ||
raise ValueError("Invalid `topology`", processed, " for given nprocs:", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if it's an assert, why bothering with this? Just leave the assert alone and that's it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it will be more helpful as a message to the user?
I can drop if you feel so
5fd03f7
to
83e8640
Compare
14b1fdc
to
f26e986
Compare
Merged, thanks |
This PR enhances the available custom topology MPI decompositions
The requirements for the number of stars to evenly divide the number of MPI procs are dropped.
You can see some topology to decomposition combos in the added tests